Overview

Dataset statistics

Number of variables16
Number of observations9702
Missing cells0
Missing cells (%)0.0%
Duplicate rows0
Duplicate rows (%)0.0%
Total size in memory1.2 MiB
Average record size in memory128.0 B

Variable types

Numeric9
Categorical7

Alerts

Unnamed: 0 is highly correlated with IDHigh correlation
ID is highly correlated with Unnamed: 0High correlation
NAME_FAMILY_STATUS is highly correlated with CNT_FAM_MEMBERSHigh correlation
CNT_FAM_MEMBERS is highly correlated with NAME_FAMILY_STATUSHigh correlation
Unnamed: 0 is highly correlated with IDHigh correlation
ID is highly correlated with Unnamed: 0High correlation
NAME_FAMILY_STATUS is highly correlated with CNT_FAM_MEMBERSHigh correlation
CNT_FAM_MEMBERS is highly correlated with NAME_FAMILY_STATUSHigh correlation
Unnamed: 0 is highly correlated with IDHigh correlation
ID is highly correlated with Unnamed: 0High correlation
NAME_FAMILY_STATUS is highly correlated with CNT_FAM_MEMBERSHigh correlation
CNT_FAM_MEMBERS is highly correlated with NAME_FAMILY_STATUSHigh correlation
Unnamed: 0 is highly correlated with IDHigh correlation
ID is highly correlated with Unnamed: 0High correlation
CODE_GENDER is highly correlated with FLAG_OWN_CAR and 1 other fieldsHigh correlation
FLAG_OWN_CAR is highly correlated with CODE_GENDERHigh correlation
NAME_INCOME_TYPE is highly correlated with OCCUPATION_TYPE and 1 other fieldsHigh correlation
NAME_FAMILY_STATUS is highly correlated with CNT_FAM_MEMBERSHigh correlation
OCCUPATION_TYPE is highly correlated with CODE_GENDER and 1 other fieldsHigh correlation
CNT_FAM_MEMBERS is highly correlated with NAME_FAMILY_STATUSHigh correlation
AGE is highly correlated with NAME_INCOME_TYPEHigh correlation
Unnamed: 0 is uniformly distributed Uniform
Unnamed: 0 has unique values Unique
ID has unique values Unique
OCCUPATION_TYPE has 300 (3.1%) zeros Zeros
YEARS_EMPLOYED has 1694 (17.5%) zeros Zeros

Reproduction

Analysis started2022-05-10 22:14:30.277308
Analysis finished2022-05-10 22:15:06.760447
Duration36.48 seconds
Software versionpandas-profiling v3.1.0
Download configurationconfig.json

Variables

Unnamed: 0
Real number (ℝ≥0)

HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION
UNIFORM
UNIQUE

Distinct9702
Distinct (%)100.0%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean4850.5
Minimum0
Maximum9701
Zeros1
Zeros (%)< 0.1%
Negative0
Negative (%)0.0%
Memory size75.9 KiB
2022-05-10T18:15:07.187349image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile485.05
Q12425.25
median4850.5
Q37275.75
95-th percentile9215.95
Maximum9701
Range9701
Interquartile range (IQR)4850.5

Descriptive statistics

Standard deviation2800.87049
Coefficient of variation (CV)0.5774395402
Kurtosis-1.2
Mean4850.5
Median Absolute Deviation (MAD)2425.5
Skewness0
Sum47059551
Variance7844875.5
MonotonicityStrictly increasing
2022-05-10T18:15:07.416644image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
01
 
< 0.1%
64621
 
< 0.1%
64641
 
< 0.1%
64651
 
< 0.1%
64661
 
< 0.1%
64671
 
< 0.1%
64681
 
< 0.1%
64691
 
< 0.1%
64701
 
< 0.1%
64711
 
< 0.1%
Other values (9692)9692
99.9%
ValueCountFrequency (%)
01
< 0.1%
11
< 0.1%
21
< 0.1%
31
< 0.1%
41
< 0.1%
51
< 0.1%
61
< 0.1%
71
< 0.1%
81
< 0.1%
91
< 0.1%
ValueCountFrequency (%)
97011
< 0.1%
97001
< 0.1%
96991
< 0.1%
96981
< 0.1%
96971
< 0.1%
96961
< 0.1%
96951
< 0.1%
96941
< 0.1%
96931
< 0.1%
96921
< 0.1%

ID
Real number (ℝ≥0)

HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION
UNIQUE

Distinct9702
Distinct (%)100.0%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean5076115.117
Minimum5008804
Maximum5150479
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size75.9 KiB
2022-05-10T18:15:07.641780image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/

Quantile statistics

Minimum5008804
5-th percentile5021597.15
Q15036955.75
median5069452.5
Q35112987.75
95-th percentile5143323.95
Maximum5150479
Range141675
Interquartile range (IQR)76032

Descriptive statistics

Standard deviation40807.0046
Coefficient of variation (CV)0.00803902269
Kurtosis-1.208459014
Mean5076115.117
Median Absolute Deviation (MAD)35512
Skewness0.1265984442
Sum4.924846886 × 1010
Variance1665211625
MonotonicityNot monotonic
2022-05-10T18:15:07.875475image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
50088041
 
< 0.1%
50971051
 
< 0.1%
50971321
 
< 0.1%
50971361
 
< 0.1%
50971461
 
< 0.1%
50971481
 
< 0.1%
50971511
 
< 0.1%
50971541
 
< 0.1%
50971551
 
< 0.1%
50971571
 
< 0.1%
Other values (9692)9692
99.9%
ValueCountFrequency (%)
50088041
< 0.1%
50088061
< 0.1%
50088081
< 0.1%
50088121
< 0.1%
50088151
< 0.1%
50088191
< 0.1%
50088251
< 0.1%
50088271
< 0.1%
50088301
< 0.1%
50088341
< 0.1%
ValueCountFrequency (%)
51504791
< 0.1%
51504671
< 0.1%
51504591
< 0.1%
51504511
< 0.1%
51504281
< 0.1%
51504101
< 0.1%
51504001
< 0.1%
51503881
< 0.1%
51503381
< 0.1%
51503371
< 0.1%

CODE_GENDER
Categorical

HIGH CORRELATION

Distinct2
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size75.9 KiB
0
6318 
1
3384 

Length

Max length1
Median length1
Mean length1
Min length1

Characters and Unicode

Total characters0
Distinct characters0
Distinct categories0 ?
Distinct scripts0 ?
Distinct blocks0 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row1
2nd row1
3rd row0
4th row0
5th row1

Common Values

ValueCountFrequency (%)
06318
65.1%
13384
34.9%

Length

2022-05-10T18:15:08.072462image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
Histogram of lengths of the category

Pie chart

2022-05-10T18:15:08.179532image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
ValueCountFrequency (%)
06318
65.1%
13384
34.9%

Most occurring characters

ValueCountFrequency (%)
No values found.

Most occurring categories

ValueCountFrequency (%)
No values found.

Most frequent character per category

Most occurring scripts

ValueCountFrequency (%)
No values found.

Most frequent character per script

Most occurring blocks

ValueCountFrequency (%)
No values found.

Most frequent character per block

FLAG_OWN_CAR
Categorical

HIGH CORRELATION

Distinct2
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size75.9 KiB
0
6135 
1
3567 

Length

Max length1
Median length1
Mean length1
Min length1

Characters and Unicode

Total characters0
Distinct characters0
Distinct categories0 ?
Distinct scripts0 ?
Distinct blocks0 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row1
2nd row1
3rd row0
4th row0
5th row1

Common Values

ValueCountFrequency (%)
06135
63.2%
13567
36.8%

Length

2022-05-10T18:15:08.286492image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
Histogram of lengths of the category

Pie chart

2022-05-10T18:15:08.383656image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
ValueCountFrequency (%)
06135
63.2%
13567
36.8%

Most occurring characters

ValueCountFrequency (%)
No values found.

Most occurring categories

ValueCountFrequency (%)
No values found.

Most frequent character per category

Most occurring scripts

ValueCountFrequency (%)
No values found.

Most frequent character per script

Most occurring blocks

ValueCountFrequency (%)
No values found.

Most frequent character per block

FLAG_OWN_REALTY
Categorical

Distinct2
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size75.9 KiB
1
6514 
0
3188 

Length

Max length1
Median length1
Mean length1
Min length1

Characters and Unicode

Total characters0
Distinct characters0
Distinct categories0 ?
Distinct scripts0 ?
Distinct blocks0 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row1
2nd row1
3rd row1
4th row1
5th row1

Common Values

ValueCountFrequency (%)
16514
67.1%
03188
32.9%

Length

2022-05-10T18:15:08.483979image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
Histogram of lengths of the category

Pie chart

2022-05-10T18:15:08.582161image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
ValueCountFrequency (%)
16514
67.1%
03188
32.9%

Most occurring characters

ValueCountFrequency (%)
No values found.

Most occurring categories

ValueCountFrequency (%)
No values found.

Most frequent character per category

Most occurring scripts

ValueCountFrequency (%)
No values found.

Most frequent character per script

Most occurring blocks

ValueCountFrequency (%)
No values found.

Most frequent character per block

AMT_INCOME_TOTAL
Real number (ℝ≥0)

Distinct263
Distinct (%)2.7%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean181219.8043
Minimum27000
Maximum1575000
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size75.9 KiB
2022-05-10T18:15:08.815802image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/

Quantile statistics

Minimum27000
5-th percentile67500
Q1112500
median157500
Q3225000
95-th percentile360000
Maximum1575000
Range1548000
Interquartile range (IQR)112500

Descriptive statistics

Standard deviation99302.19023
Coefficient of variation (CV)0.5479654425
Kurtosis15.77629382
Mean181219.8043
Median Absolute Deviation (MAD)45000
Skewness2.659466621
Sum1758194541
Variance9860924985
MonotonicityNot monotonic
2022-05-10T18:15:09.068031image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
1350001138
 
11.7%
180000843
 
8.7%
112500842
 
8.7%
157500829
 
8.5%
225000749
 
7.7%
202500558
 
5.8%
90000535
 
5.5%
270000425
 
4.4%
67500265
 
2.7%
315000232
 
2.4%
Other values (253)3286
33.9%
ValueCountFrequency (%)
270002
< 0.1%
292501
 
< 0.1%
301501
 
< 0.1%
315003
< 0.1%
31531.51
 
< 0.1%
319501
 
< 0.1%
324001
 
< 0.1%
333002
< 0.1%
337501
 
< 0.1%
360004
< 0.1%
ValueCountFrequency (%)
15750001
 
< 0.1%
13500001
 
< 0.1%
11250003
 
< 0.1%
9900001
 
< 0.1%
9450001
 
< 0.1%
90000010
0.1%
8100006
0.1%
7875001
 
< 0.1%
7650002
 
< 0.1%
7425001
 
< 0.1%

NAME_INCOME_TYPE
Categorical

HIGH CORRELATION

Distinct5
Distinct (%)0.1%
Missing0
Missing (%)0.0%
Memory size75.9 KiB
4
4956 
0
2312 
1
1710 
2
721 
3
 
3

Length

Max length1
Median length1
Mean length1
Min length1

Characters and Unicode

Total characters0
Distinct characters0
Distinct categories0 ?
Distinct scripts0 ?
Distinct blocks0 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row4
2nd row4
3rd row0
4th row1
5th row4

Common Values

ValueCountFrequency (%)
44956
51.1%
02312
23.8%
11710
 
17.6%
2721
 
7.4%
33
 
< 0.1%

Length

2022-05-10T18:15:09.277725image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
Histogram of lengths of the category

Pie chart

2022-05-10T18:15:09.382528image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
ValueCountFrequency (%)
44956
51.1%
02312
23.8%
11710
 
17.6%
2721
 
7.4%
33
 
< 0.1%

Most occurring characters

ValueCountFrequency (%)
No values found.

Most occurring categories

ValueCountFrequency (%)
No values found.

Most frequent character per category

Most occurring scripts

ValueCountFrequency (%)
No values found.

Most frequent character per script

Most occurring blocks

ValueCountFrequency (%)
No values found.

Most frequent character per block

Distinct5
Distinct (%)0.1%
Missing0
Missing (%)0.0%
Memory size75.9 KiB
4
6757 
1
2454 
2
 
371
3
 
114
0
 
6

Length

Max length1
Median length1
Mean length1
Min length1

Characters and Unicode

Total characters0
Distinct characters0
Distinct categories0 ?
Distinct scripts0 ?
Distinct blocks0 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row1
2nd row4
3rd row4
4th row1
5th row1

Common Values

ValueCountFrequency (%)
46757
69.6%
12454
 
25.3%
2371
 
3.8%
3114
 
1.2%
06
 
0.1%

Length

2022-05-10T18:15:09.507959image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
Histogram of lengths of the category

Pie chart

2022-05-10T18:15:09.621083image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
ValueCountFrequency (%)
46757
69.6%
12454
 
25.3%
2371
 
3.8%
3114
 
1.2%
06
 
0.1%

Most occurring characters

ValueCountFrequency (%)
No values found.

Most occurring categories

ValueCountFrequency (%)
No values found.

Most frequent character per category

Most occurring scripts

ValueCountFrequency (%)
No values found.

Most frequent character per script

Most occurring blocks

ValueCountFrequency (%)
No values found.

Most frequent character per block

NAME_FAMILY_STATUS
Categorical

HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION

Distinct5
Distinct (%)0.1%
Missing0
Missing (%)0.0%
Memory size75.9 KiB
1
6526 
3
1358 
0
836 
2
 
572
4
 
410

Length

Max length1
Median length1
Mean length1
Min length1

Characters and Unicode

Total characters0
Distinct characters0
Distinct categories0 ?
Distinct scripts0 ?
Distinct blocks0 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row0
2nd row1
3rd row3
4th row2
5th row1

Common Values

ValueCountFrequency (%)
16526
67.3%
31358
 
14.0%
0836
 
8.6%
2572
 
5.9%
4410
 
4.2%

Length

2022-05-10T18:15:09.792097image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
Histogram of lengths of the category

Pie chart

2022-05-10T18:15:09.899935image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
ValueCountFrequency (%)
16526
67.3%
31358
 
14.0%
0836
 
8.6%
2572
 
5.9%
4410
 
4.2%

Most occurring characters

ValueCountFrequency (%)
No values found.

Most occurring categories

ValueCountFrequency (%)
No values found.

Most frequent character per category

Most occurring scripts

ValueCountFrequency (%)
No values found.

Most frequent character per script

Most occurring blocks

ValueCountFrequency (%)
No values found.

Most frequent character per block

NAME_HOUSING_TYPE
Real number (ℝ≥0)

Distinct6
Distinct (%)0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean1.274685632
Minimum0
Maximum5
Zeros34
Zeros (%)0.4%
Negative0
Negative (%)0.0%
Memory size75.9 KiB
2022-05-10T18:15:10.009770image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile1
Q11
median1
Q31
95-th percentile4
Maximum5
Range5
Interquartile range (IQR)0

Descriptive statistics

Standard deviation0.930142303
Coefficient of variation (CV)0.7297032929
Kurtosis10.10810157
Mean1.274685632
Median Absolute Deviation (MAD)0
Skewness3.374696287
Sum12367
Variance0.8651647038
MonotonicityNot monotonic
2022-05-10T18:15:10.153176image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
Histogram with fixed size bins (bins=6)
ValueCountFrequency (%)
18677
89.4%
5448
 
4.6%
2323
 
3.3%
4144
 
1.5%
376
 
0.8%
034
 
0.4%
ValueCountFrequency (%)
034
 
0.4%
18677
89.4%
2323
 
3.3%
376
 
0.8%
4144
 
1.5%
5448
 
4.6%
ValueCountFrequency (%)
5448
 
4.6%
4144
 
1.5%
376
 
0.8%
2323
 
3.3%
18677
89.4%
034
 
0.4%

OCCUPATION_TYPE
Real number (ℝ≥0)

HIGH CORRELATION
ZEROS

Distinct19
Distinct (%)0.2%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean9.220882292
Minimum0
Maximum18
Zeros300
Zeros (%)3.1%
Negative0
Negative (%)0.0%
Memory size75.9 KiB
2022-05-10T18:15:10.301410image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile2
Q16
median10
Q312
95-th percentile15
Maximum18
Range18
Interquartile range (IQR)6

Descriptive statistics

Standard deviation4.274346628
Coefficient of variation (CV)0.4635507202
Kurtosis-0.6975556125
Mean9.220882292
Median Absolute Deviation (MAD)2
Skewness-0.4240024851
Sum89461
Variance18.27003909
MonotonicityNot monotonic
2022-05-10T18:15:10.472314image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
Histogram with fixed size bins (bins=19)
ValueCountFrequency (%)
122992
30.8%
81724
17.8%
15958
 
9.9%
3875
 
9.0%
10782
 
8.1%
4622
 
6.4%
6357
 
3.7%
0300
 
3.1%
11291
 
3.0%
2193
 
2.0%
Other values (9)608
 
6.3%
ValueCountFrequency (%)
0300
 
3.1%
1146
 
1.5%
2193
 
2.0%
3875
9.0%
4622
 
6.4%
522
 
0.2%
6357
 
3.7%
718
 
0.2%
81724
17.8%
953
 
0.5%
ValueCountFrequency (%)
1839
 
0.4%
17182
 
1.9%
1646
 
0.5%
15958
 
9.9%
1416
 
0.2%
1386
 
0.9%
122992
30.8%
11291
 
3.0%
10782
 
8.1%
953
 
0.5%

CNT_FAM_MEMBERS
Real number (ℝ≥0)

HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION

Distinct8
Distinct (%)0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean2.179550608
Minimum1
Maximum9
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size75.9 KiB
2022-05-10T18:15:10.645826image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/

Quantile statistics

Minimum1
5-th percentile1
Q12
median2
Q33
95-th percentile4
Maximum9
Range8
Interquartile range (IQR)1

Descriptive statistics

Standard deviation0.9062441788
Coefficient of variation (CV)0.4157940519
Kurtosis1.416363506
Mean2.179550608
Median Absolute Deviation (MAD)0
Skewness0.9572669926
Sum21146
Variance0.8212785116
MonotonicityNot monotonic
2022-05-10T18:15:10.791419image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
Histogram with fixed size bins (bins=8)
ValueCountFrequency (%)
25178
53.4%
11947
 
20.1%
31635
 
16.9%
4802
 
8.3%
5117
 
1.2%
618
 
0.2%
74
 
< 0.1%
91
 
< 0.1%
ValueCountFrequency (%)
11947
 
20.1%
25178
53.4%
31635
 
16.9%
4802
 
8.3%
5117
 
1.2%
618
 
0.2%
74
 
< 0.1%
91
 
< 0.1%
ValueCountFrequency (%)
91
 
< 0.1%
74
 
< 0.1%
618
 
0.2%
5117
 
1.2%
4802
 
8.3%
31635
 
16.9%
25178
53.4%
11947
 
20.1%

AGE
Real number (ℝ≥0)

HIGH CORRELATION

Distinct7171
Distinct (%)73.9%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean43.78130175
Minimum20.50418558
Maximum68.86383704
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size75.9 KiB
2022-05-10T18:15:10.992170image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/

Quantile statistics

Minimum20.50418558
5-th percentile26.59383834
Q134.05545631
median42.73735943
Q353.56646611
95-th percentile63.0207328
Maximum68.86383704
Range48.35965146
Interquartile range (IQR)19.51100981

Descriptive statistics

Standard deviation11.62574179
Coefficient of variation (CV)0.2655412545
Kurtosis-1.053156755
Mean43.78130175
Median Absolute Deviation (MAD)9.548450687
Skewness0.1505990557
Sum424766.1896
Variance135.1578722
MonotonicityNot monotonic
2022-05-10T18:15:11.219517image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
28.542680556
 
0.1%
38.664722755
 
0.1%
57.04703045
 
0.1%
48.299417515
 
0.1%
58.350274135
 
0.1%
52.107846165
 
0.1%
56.644558075
 
0.1%
40.865999995
 
0.1%
58.489907395
 
0.1%
55.040144565
 
0.1%
Other values (7161)9651
99.5%
ValueCountFrequency (%)
20.504185581
< 0.1%
21.095573491
< 0.1%
21.144855811
< 0.1%
21.237944651
< 0.1%
21.791001871
< 0.1%
21.848497921
< 0.1%
22.015510241
< 0.1%
22.051103031
< 0.1%
22.056578851
< 0.1%
22.086695831
< 0.1%
ValueCountFrequency (%)
68.863837041
< 0.1%
68.830982161
< 0.1%
68.718727971
< 0.1%
68.688610991
< 0.1%
68.475054241
< 0.1%
68.365537961
< 0.1%
68.346372621
< 0.1%
68.29982821
< 0.1%
68.26149751
< 0.1%
68.212215171
< 0.1%

YEARS_EMPLOYED
Real number (ℝ≥0)

ZEROS

Distinct3636
Distinct (%)37.5%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean5.666517901
Minimum0
Maximum43.0207328
Zeros1694
Zeros (%)17.5%
Negative0
Negative (%)0.0%
Memory size75.9 KiB
2022-05-10T18:15:11.436879image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile0
Q10.9288349521
median3.761884228
Q38.202769393
95-th percentile18.79778503
Maximum43.0207328
Range43.0207328
Interquartile range (IQR)7.273934441

Descriptive statistics

Standard deviation6.343724493
Coefficient of variation (CV)1.119510183
Kurtosis4.219012656
Mean5.666517901
Median Absolute Deviation (MAD)3.27042992
Skewness1.8443603
Sum54976.55667
Variance40.24284044
MonotonicityNot monotonic
2022-05-10T18:15:11.644985image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
01694
 
17.5%
0.547581401417
 
0.2%
1.0979007117
 
0.2%
2.79814096114
 
0.1%
0.344976282914
 
0.1%
1.83987350913
 
0.1%
1.25943722313
 
0.1%
0.679000937713
 
0.1%
0.681738844713
 
0.1%
2.0123616512
 
0.1%
Other values (3626)7882
81.2%
ValueCountFrequency (%)
01694
17.5%
0.046544419121
 
< 0.1%
0.11773000131
 
< 0.1%
0.17796395551
 
< 0.1%
0.18070186251
 
< 0.1%
0.19165349052
 
< 0.1%
0.19439139751
 
< 0.1%
0.19986721153
 
< 0.1%
0.21355674651
 
< 0.1%
0.21629465361
 
< 0.1%
ValueCountFrequency (%)
43.02073281
< 0.1%
42.878361641
< 0.1%
41.690111
< 0.1%
41.265734411
< 0.1%
41.172645571
< 0.1%
40.759221611
< 0.1%
40.548402771
< 0.1%
40.452576031
< 0.1%
39.798216251
< 0.1%
39.625728111
< 0.1%

STATUS
Categorical

Distinct2
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size75.9 KiB
0
8424 
1
1278 

Length

Max length1
Median length1
Mean length1
Min length1

Characters and Unicode

Total characters0
Distinct characters0
Distinct categories0 ?
Distinct scripts0 ?
Distinct blocks0 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row1
2nd row0
3rd row0
4th row0
5th row0

Common Values

ValueCountFrequency (%)
08424
86.8%
11278
 
13.2%

Length

2022-05-10T18:15:11.823149image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
Histogram of lengths of the category

Pie chart

2022-05-10T18:15:11.924549image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
ValueCountFrequency (%)
08424
86.8%
11278
 
13.2%

Most occurring characters

ValueCountFrequency (%)
No values found.

Most occurring categories

ValueCountFrequency (%)
No values found.

Most frequent character per category

Most occurring scripts

ValueCountFrequency (%)
No values found.

Most frequent character per script

Most occurring blocks

ValueCountFrequency (%)
No values found.

Most frequent character per block

MONTHS_BALANCE
Real number (ℝ≥0)

Distinct61
Distinct (%)0.6%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean27.26303855
Minimum0
Maximum60
Zeros57
Zeros (%)0.6%
Negative0
Negative (%)0.0%
Memory size75.9 KiB
2022-05-10T18:15:12.057611image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile3
Q113
median26
Q341
95-th percentile56
Maximum60
Range60
Interquartile range (IQR)28

Descriptive statistics

Standard deviation16.64688326
Coefficient of variation (CV)0.6106026382
Kurtosis-1.089553494
Mean27.26303855
Median Absolute Deviation (MAD)14
Skewness0.2170390491
Sum264506
Variance277.1187224
MonotonicityNot monotonic
2022-05-10T18:15:12.267371image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
11219
 
2.3%
13216
 
2.2%
7215
 
2.2%
16212
 
2.2%
15210
 
2.2%
5210
 
2.2%
18206
 
2.1%
39201
 
2.1%
6197
 
2.0%
3196
 
2.0%
Other values (51)7620
78.5%
ValueCountFrequency (%)
057
 
0.6%
1136
1.4%
2167
1.7%
3196
2.0%
4183
1.9%
5210
2.2%
6197
2.0%
7215
2.2%
8189
1.9%
9185
1.9%
ValueCountFrequency (%)
60100
1.0%
5998
1.0%
58111
1.1%
5790
0.9%
56114
1.2%
55101
1.0%
54102
1.1%
53115
1.2%
52104
1.1%
51136
1.4%

Interactions

2022-05-10T18:15:04.051056image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2022-05-10T18:14:41.248151image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2022-05-10T18:14:43.986446image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2022-05-10T18:14:46.455581image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2022-05-10T18:14:48.367825image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2022-05-10T18:14:50.215978image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2022-05-10T18:14:58.766213image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2022-05-10T18:15:00.639525image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2022-05-10T18:15:02.350813image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2022-05-10T18:15:04.267846image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2022-05-10T18:14:41.827516image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2022-05-10T18:14:44.642986image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2022-05-10T18:14:46.662026image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2022-05-10T18:14:48.564069image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2022-05-10T18:14:50.412277image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2022-05-10T18:14:58.977389image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2022-05-10T18:15:00.828464image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2022-05-10T18:15:02.529526image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2022-05-10T18:15:04.465805image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2022-05-10T18:14:42.126112image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2022-05-10T18:14:44.879509image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2022-05-10T18:14:46.866764image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2022-05-10T18:14:48.815651image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2022-05-10T18:14:50.615499image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2022-05-10T18:14:59.183777image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2022-05-10T18:15:01.017617image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2022-05-10T18:15:02.717158image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2022-05-10T18:15:04.675941image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2022-05-10T18:14:42.417442image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2022-05-10T18:14:45.125643image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2022-05-10T18:14:47.097201image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2022-05-10T18:14:49.027367image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2022-05-10T18:14:50.838848image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2022-05-10T18:14:59.399642image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2022-05-10T18:15:01.220836image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2022-05-10T18:15:02.914742image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2022-05-10T18:15:04.860140image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2022-05-10T18:14:42.610322image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2022-05-10T18:14:45.323323image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2022-05-10T18:14:47.295029image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2022-05-10T18:14:49.227823image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2022-05-10T18:14:57.589562image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2022-05-10T18:14:59.586271image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2022-05-10T18:15:01.401455image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2022-05-10T18:15:03.089317image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2022-05-10T18:15:05.096167image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2022-05-10T18:14:43.003653image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2022-05-10T18:14:45.537301image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2022-05-10T18:14:47.509169image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2022-05-10T18:14:49.427577image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2022-05-10T18:14:57.874099image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2022-05-10T18:14:59.828084image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2022-05-10T18:15:01.592817image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2022-05-10T18:15:03.285416image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2022-05-10T18:15:05.297210image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2022-05-10T18:14:43.265522image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2022-05-10T18:14:45.830630image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2022-05-10T18:14:47.762515image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2022-05-10T18:14:49.647148image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2022-05-10T18:14:58.118288image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2022-05-10T18:15:00.046017image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2022-05-10T18:15:01.795894image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2022-05-10T18:15:03.484548image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2022-05-10T18:15:05.490551image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2022-05-10T18:14:43.501411image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2022-05-10T18:14:46.052598image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2022-05-10T18:14:47.965920image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2022-05-10T18:14:49.832955image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2022-05-10T18:14:58.328227image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2022-05-10T18:15:00.252865image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2022-05-10T18:15:01.980412image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2022-05-10T18:15:03.661203image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2022-05-10T18:15:05.663701image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2022-05-10T18:14:43.725331image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2022-05-10T18:14:46.237636image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2022-05-10T18:14:48.155205image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2022-05-10T18:14:50.013578image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2022-05-10T18:14:58.536090image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2022-05-10T18:15:00.436332image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2022-05-10T18:15:02.161323image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2022-05-10T18:15:03.830758image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/

Correlations

2022-05-10T18:15:12.510597image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/

Spearman's ρ

The Spearman's rank correlation coefficient (ρ) is a measure of monotonic correlation between two variables, and is therefore better in catching nonlinear monotonic correlations than Pearson's r. It's value lies between -1 and +1, -1 indicating total negative monotonic correlation, 0 indicating no monotonic correlation and 1 indicating total positive monotonic correlation.

To calculate ρ for two variables X and Y, one divides the covariance of the rank variables of X and Y by the product of their standard deviations.
2022-05-10T18:15:12.929151image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/

Pearson's r

The Pearson's correlation coefficient (r) is a measure of linear correlation between two variables. It's value lies between -1 and +1, -1 indicating total negative linear correlation, 0 indicating no linear correlation and 1 indicating total positive linear correlation. Furthermore, r is invariant under separate changes in location and scale of the two variables, implying that for a linear function the angle to the x-axis does not affect r.

To calculate r for two variables X and Y, one divides the covariance of X and Y by the product of their standard deviations.
2022-05-10T18:15:13.273045image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/

Kendall's τ

Similarly to Spearman's rank correlation coefficient, the Kendall rank correlation coefficient (τ) measures ordinal association between two variables. It's value lies between -1 and +1, -1 indicating total negative correlation, 0 indicating no correlation and 1 indicating total positive correlation.

To calculate τ for two variables X and Y, one determines the number of concordant and discordant pairs of observations. τ is given by the number of concordant pairs minus the discordant pairs divided by the total number of pairs.
2022-05-10T18:15:13.568668image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/

Cramér's V (φc)

Cramér's V is an association measure for nominal random variables. The coefficient ranges from 0 to 1, with 0 indicating independence and 1 indicating perfect association. The empirical estimators used for Cramér's V have been proved to be biased, even for large samples. We use a bias-corrected measure that has been proposed by Bergsma in 2013 that can be found here.
2022-05-10T18:15:13.806089image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/

Phik (φk)

Phik (φk) is a new and practical correlation coefficient that works consistently between categorical, ordinal and interval variables, captures non-linear dependency and reverts to the Pearson correlation coefficient in case of a bivariate normal input distribution. There is extensive documentation available here.

Missing values

2022-05-10T18:15:06.039486image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
A simple visualization of nullity by column.
2022-05-10T18:15:06.547390image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.

Sample

First rows

Unnamed: 0IDCODE_GENDERFLAG_OWN_CARFLAG_OWN_REALTYAMT_INCOME_TOTALNAME_INCOME_TYPENAME_EDUCATION_TYPENAME_FAMILY_STATUSNAME_HOUSING_TYPEOCCUPATION_TYPECNT_FAM_MEMBERSAGEYEARS_EMPLOYEDSTATUSMONTHS_BALANCE
005008804111427500.0410412232.86857412.435574115
115008806111112500.0441117258.7938153.104787029
225008808001270000.0043115152.3214038.35335404
335008812001283500.0112112161.5043430.000000020
445008815111270000.041110246.1939672.10545005
555008819111135000.004118248.6745113.269061017
665008825010130500.042110229.2107303.019911125
775008830001157500.044118227.4639454.021985131
885008834001112500.0443112230.0293644.435409044
995008836111270000.044118534.7413023.184186024

Last rows

Unnamed: 0IDCODE_GENDERFLAG_OWN_CARFLAG_OWN_REALTYAMT_INCOME_TOTALNAME_INCOME_TYPENAME_EDUCATION_TYPENAME_FAMILY_STATUSNAME_HOUSING_TYPEOCCUPATION_TYPECNT_FAM_MEMBERSAGEYEARS_EMPLOYEDSTATUSMONTHS_BALANCE
969296925142973100180000.044118129.1751372.535302118
969396935143578110157500.042354224.9806642.628391114
969496945145690001306000.0111112259.1114120.000000117
969596955145760010135000.0411112242.34994613.235042110
969696965146078001108000.0443115134.8343913.099311148
969796975148694000180000.014028256.4008840.542106120
969896985149055001112500.0041112243.3602337.375921119
96999699514972911190000.0441112252.2967624.711938121
970097005149838001157500.0111111233.9144543.627727132
970197015150337101112500.044348125.1558903.266323113